Comments for MEDB 5501, Module01

Topics to be covered

  • What you will learn
    • About this class
    • R and RStudio
    • Programming assignments
    • Scales of measurement
    • Tests of hypothesis
    • Confidence intervals
    • A simple R program

Welcome to the class

  • Three sections
    • 0001, Synchronous Zoom meetings on Tuesdays
    • 0002, Asynchronous
    • 0003, International students
  • Introductions
    • Ricardo Moniz
    • Suman Sahil

Requirements of all students

  • Attend Tuesday Zoom or watch video of Tuesday Zoom
  • Read book chapter
  • Optional review session on Zoom on Fridays
  • Complete all assignments by Monday at 11:59pm

Attendance

  • Synchronous students
    • MUST attend Tuesday Zoom sessions.
    • can also review the recordings
  • Asynchronous students
    • watch the Tuesday recordings
    • or attend some of the Tuesday Zoom sessions
    • or both
  • Failure to attend is a problem

Assignments

  • Due Mondays at 11:59pm
  • Policy on Failure to submit on time is a problem

This class is in transition

  • Taught for many years by Dr. Monica Gaddis
  • My second time teaching this class
  • Major changes
    • Software agnosticism
      • In class switch from SPSS to R
    • Suman Sahil will co-teach and cover programming in R
    • Discussion boards ask for feedback
    • Proposed exam questions

Student learning objectives, 1 of 3

  • SLO1
    • The graduate will be able to use statistics to analyze and interpret data. They will understand the fundamentals of the field in the context of recognizing the effective use of data or information for the specific discipline(s). They will select and apply appropiate statistical procedures to the information. They will be able to analyze and accurately interpret of statistical result.

Student learning objectives, 2 of 3

  • SLO2
    • The graduate will be able to design a testable research question or hypothesis. They will have adequate background knowledge about biological, biomedical, or population health contexts and problems including common research problems in order to generate a research question or hypothesis. They will be able to relate problems within and across levels of areas of the spectrum to bridge disciplines.

Student learning objectives, 3 of 3

  • SLO5
    • The graduate will be able to communicate scientific outcomes. This includes the ability to convey scientific methods and statistical findings, effectively field questions in an oral presentation format as well as in the preparation of thesis or capstone manuscripts.

Break #1

  • What you have learned
    • About this class
  • What’s coming next
    • R and RStudio

R

RStudio

Other statistical software

  • JMP, Python, R, SAS, SPSS, Stata
    • Use only if you are confident in your abilities
    • Avoid Microsoft Excel

Break #2

  • What you have learned
    • R and RStudio
  • What’s coming next
    • Programming assignments

Test

Break #3

  • What you have learned
    • Programming assignments
  • What’s coming next
    • Scales of measurement

Scales of measurement

  • Dichotomy
    • Continuous
    • Categorical
  • Stevens scales of measurement (controversial!)
    • Nominal
    • Ordinal
    • Interval
    • Ratio
  • Addition/subtraction not allowed for ordinal data
    • Mean of ordinal data is meaningless

An example of ordinal data.

  • “Do you agree or disagree with the following statements”
    • “I believe that knowledge of Statistics is important for my job.”
      • 1 = Strongly disagree,
      • 2 = Disagree
      • 3 = Neutral
      • 4 = Agree
      • 5 = Strongly agree

Another example of ordinal data, course grades

  • A = 4
  • B = 3
  • C = 2
  • D = 1
  • F = 0

Break #4

  • What you have learned
    • Scales of measurement
  • What’s coming next
    • Tests of hypothesis

What is a population?

  • Population: a group that you wish to generalize your research results to. It is defined in terms of
    • Demography,
    • Geography,
    • Occupation,
    • Time,
    • Care requirements,
    • Diagnosis,
    • Or some combination of the above.

Example of a population

All infants born in the state of Missouri during the 1995 calendar year who have one or more visits to the Emergency room during their first year of life.

What is a sample?

  • Sample: subset of a population.
  • Random sample: every person has the same probability of being in the sample.
  • Biased sample: Some people have a decreased probability of being in the sample.
    • Always ask “who was left out?”

An example of a biased sample

  • A researcher wants to characterize illicit drug use in teenagers. She distributes a questionnaire to students attending a local public high school
  • (in the U.S. high school is grades 9-12, which is mostly students from ages 14 to 18.)
  • Explain how this sample is biased.
  • Who has a decreased or even zero probability of being selected.

Type your ideas in the chat box.

Fixing a biased sample

  • Redfine your population
    • Not all teenagers,
      • but those attending public high schools.

What is a parameter?

  • A parameter is a number computed from a population.
    • Examples
      • Average health care cost associated with the 29,637 children
      • Proportion of these 29,637 children who died in their first year of life.
      • Correlation between gestational age and number of ER visits of these 29,637 children.
    • Designated by Greek letters (\(\mu\), \(\pi\), \(\rho\))

What is a statistic?

  • A statistic is a number computed from a sample
    • Examples
      • Average health care cost associated with 100 children.
      • Proportion of these 100 children who died in their first year of life.
      • Correlation between gestational age and number of ER visits of these 100 children.
    • Designated by non-Greek letters (\(\bar{X}\), \(\hat{p}\), r).

What is Statistics?

  • Statistics
    • The use of information from a sample (a statistic) to make inferences about a population (a parameter)
      • Often a comparison of two populations

What is the null hypothesis?

  • The null hypothesis (\(H_0\)) is a statement about a parameter.
  • It implies no difference, no change, or no relationship.
    • Examples
      • \(H_0:\ \mu_1 - \mu_2 = 0\)
      • \(H_0:\ \pi_1 - \pi_2 = 0\)
      • \(H_0:\ \rho = 0\)

What is the alternative hypothesis?

  • The alternative hypothesis (\(H_1\) or \(H_a\)) implies a difference, change, or relationship.
    • Examples
      • \(H_1:\ \mu_1 - \mu_2 \ne 0\)
      • \(H_1:\ \pi_1 - \pi_2 \ne 0\)
      • \(H_1:\ \rho \ne 0\)

Hypothesis in English instead of Greek

  • Only statisticians like Greek letters
    • Translate to simple text
    • For two group comparisons
      • Safer, more effective
    • For regression models
      • Trend, association

Use PICO

  • P = patient population
  • I = intervention
  • C = control
  • O = outcome

Example of text hypotheses (1/2)

  • “… the objective of this 78-week randomised, placebo-controlled study was to determine whether treatment with nilvadipine sustained-release 8 mg, once a day, was effective and safe in slowing the rate of cognitive decline in patients with mild to moderate Alzheimer disease.”
    • Lawlor B, Segurado R, Kennelly S, et al. Nilvadipine in mild to moderate Alzheimer disease: A randomised controlled trial. PLoS Med. 2018; 15(9): e1002660. DOI: 10.1371/journal.pmed.1002660

PICO for this study

  • P = patients with mild to moderate Alzheimer disease
  • I = Nilvadine
  • C = placebo
  • O = cognitive function

Example of text hypotheses (2/2)

  • “… we investigated trends in BCC incidence over a span of 20 years and the associations between incident BCC and risk factors in a total population of 140,171 participants from 2 large US-based cohort studies: women in the Nurses’ Health Study (NHS; 1986–2006) and men in the Health Professionals’ Follow-up Study (HPFS; 1988–2006).”
    • Wu S, Han J, Li WQ, Li T, Qureshi AA. Basal-cell carcinoma incidence and associated risk factors in U.S. women and men. Am J Epidemiol. 2013; 178(6): 890–897. DOI: 10.1093/aje/kwt073

PICO for this study

  • P = female nurses/male health professionals
  • I = various risk factors
  • C = absence of various risk factors
  • O = presence/absence of BCC

One-sided alternatives

  • Examples
    • \(H_1:\ \mu_1 - \mu_2 \gt 0\)
    • \(H_1:\ \pi_1 - \pi_2 \gt 0\)
    • \(H_1:\ \rho \gt 0\)
  • Changes in only one direction expected
  • Changes in opposite direction uninteresting

Passive smoking controversy

  • EPA meta-analysis of passive smoking
    • Criticized for using a one-sided hypothesis
    • Samet JM, Burke TA. Turning science into junk: the tobacco industry and passive smoking. Am J Public Health. 2001;91(11):1742–1744.

What is a decision rule? (1/3)

  • Example
    • \(H_0:\ \mu_1 - \mu_2 = 0\)
    • \(H_1:\ \mu_1 - \mu_2 \ne 0\)
    • t = (\(\bar{X}_1-\bar{X}_2\)) / se
    • Accept \(H_0\) if t is close to zero.

What is a decision rule? (2/3)

  • Example
    • \(H_0:\ \pi_1 - \pi_2 = 0\)
    • \(H_1:\ \pi_1 - \pi_2 \ne 0\)
    • t = (\(\hat{p}_1-\hat{p}_2\)) / se
    • Accept \(H_0\) if t is close to zero.

What is a decision rule? (3/3)

  • Example
    • \(H_0:\ \rho = 0\)
    • \(H_1:\ \rho \ne 0\)
    • t = r / se
    • Accept \(H_0\) if t is close to zero.

What is a Type I error?

  • A Type I error is rejecting the null hypothesis when the null hypothesis is true
    • False positive
    • Example involving drug approval: a Type I error is allowing an ineffective drug onto the market.
  • \(\alpha\) = P[Type I error]

What is a Type II error?

  • A Type II error is accepting the null hypothesis when the null hypothesis is false.
    • False negative result
    • Usually computed at MCD
    • An example involving drug approval: a Type II error is keeping an effective drug off of the market.
  • \(\beta\) = P[Type II error]
  • Power = \(1-\beta\)

What is a p-value?

  • Let t =
    • (\(\bar{X}_1-\bar{X}_2\)) / se, or
    • (\(\hat{p}_1-\hat{p}_2\)) / se, or
    • r / se
  • p-value = Prob of sample result, t, or a result more extreme,
    • assuming the null hypothesis is true
  • Small p-value, reject \(H_0\)
  • Large p-value, accept \(H_0\)

Alternate interpretations

  • Consistency between the data and the null
    • Small value, inconsistent
    • Large value, consistent
  • Evidence against the null
    • Small, lots of evidence against the null
    • Large, little evidence against the null

What the p-value is not (1/2)

  • A p-value is NOT the probability that the null hypothesis is true.
    • P[t or more extreme | null] is different than
    • P[null | t or more extreme]
      • P[null] is nonsensical
      • \(\mu\), \(\pi\), or \(\rho\) are unknown constants (no sampling error)

What the p-value is not (2/2)

  • Not a measure FOR either hypothesis
    • Little evidence against the null \(\ne\) lots of evidence for the null
  • Not very informative if it is large
    • Need a power calculation, or
    • Narrow confidence interval
  • Not very helpful for huge data sets

A research paper computes a p-value of 0.45. How would you interpret this p-value?

  1. Strong evidence for the null
  2. Strong evidence for the alternative
  3. Little or no evidence for the null
  4. Little or no evidence for the alternative
  5. More than one answer above is correct.
  6. I do not know the answer.

Figure 1: xkcd cartoon about jelly beans and cancer

What is p-hacking?

  • Abuse of the hypothesis testing framework.
    • Run multiple tests on the same outcome
    • Test multiple outcome measures
    • Remove outliers and retest
  • Defenses against p-hacking
    • Bonferroni
    • Primary versus secondary
    • Published protocol

Break #5

  • What you have learned
    • Tests of hypothesis
  • What’s coming next
    • Confidence intervals

What is a confidence interval?

  • Range of plausible values
    • Tries to quantify uncertainty associated with the sampling process.

Example of a confidence interval

  • Homeopathic treatment of swelling after oral surgery
    • 95% CI: -5.5 to 7.5 mm
    • Lokken P, Straumsheim PA, Tveiten D, Skjelbred P, Borchgrevink CF. Effect of homoeopathy on pain and other events after acute trauma: placebo controlled trial with bilateral oral surgery BMJ. 1995;310(6992):1439-1442.

Confidence interval interpretation (1 of 7)

Figure 2: Interval that contains the null value

Confidence interval interpretation (2 of 7)

Figure 3: Interval entirely above the null value

Confidence interval interpretation (3 of 7)

Figure 4: Interval entirely below the null value

Confidence interval interpretation (4 of 7)

Figure 5: Interval entirely inside the range of clinical indifference

Confidence interval interpretation (5 of 7)

Figure 6: Interval partly inside/outside range of clinical indifference

Quiz question, revisited

A research paper computes a confidence interval for a relative risk of 0.82 to 3.94. This confidence interval tells that the result is

  1. statistically significant and clinically important.
  2. not statistically significant, but is clinically important.
  3. statistically significant, but not clinically important.
  4. not statistically significant, and not clinically important.
  5. The result is ambiguous.
  6. I do not know the answer.

Confidence interval interpretation (6 of 7)

Figure 7: Confidence interval that contains the null value

Confidence interval interpretation (7 of 7)

Figure 8: Confidence interval entirely outside the range of clinical indifference

Why you might prefer a confidence interval

  • Provides same information as p-value,
    • Clinical importance
    • Distinguish between
      • definitive negative result, or
      • more research is needed

Break #6

  • What you have learned
    • Confidence intervals
  • What’s coming next
    • A simple R program

Grading rubric

{{< include ../../build-website/text/14/history-of-r.md}}

Summary

  • What you have learned
    • About this class
    • R and RStudio
    • Programming assignments
    • Scales of measurement
    • Tests of hypothesis
    • Confidence intervals
    • A simple R program